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Abstract 

Decomposing a total causal effect into natural direct and indirect effects is central to 
revealing causal mechanisms. Conventional methods achieve the decomposition by 
specifying an outcome model as a linear function of the treatment, the mediator, and the 
observed covariates under identification assumptions including the assumption of no 
interaction between treatment and mediator. Recent statistical advances relax this 
assumption typically within the linear or nonlinear regression framework with few 
exceptions. 1 propose a non-parametric approach that also relaxes the assumption of no 
treatment-mediator interaction while avoiding the problems of outcome model 
specification that become particularly acute in the presence of a large number of 
covariates. The key idea is to estimate the marginal mean of each counterfactual outcome 
by weighting every experimental unit such that the weighted distribution of the mediator 
under the experimental condition approximates their counterfactual mediator distribution 
under the control condition. The weight to be applied for this purpose is a ratio of the 
conditional probability of a mediator value under the control condition to the conditional 
probability of the same mediator value under the experimental condition. A non- 
parametric approach to estimating the weight on the basis of propensity score 
stratification promises to increase the robustness of the direct and indirect effect 
estimates. The outcome is modeled as a function of the direct and indirect effects with 
minimal model-based assumptions. In contrast with the regression-based approaches, this 
new method applies regardless of the distribution of the outcome or the functional 
relationship between the outcome and the mediator, and is suitable for handling a large 
number of pretreatment covariates. Under modified identification assumptions, the 
weighting method also makes adjustment for post-treatment covariates, a benefit 
apparently unavailable in the existing methods. 

Key Words: Causal inference; Counterfactual outcomes; Identification; Mediation; 
Non-parametric method; Post-treatment covariates; Treatment-mediator interaction. 


1. Introduction 

The goal of many scientific investigations is to not just examine whether input A would 
generate output Y but also discern among competing theories explaining why A causes Y . 
If A changes Z which subsequently changes Y, Z is considered to be a mediator. The 
remaining effect of A on Y that has not been channeled through Z has been called the 
direct effect. For example, we might ask whether attending a federally funded Head Start 
program improves the attention skills of children from low-income families and thereby 
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reduces their likelihood of repeating kindergarten. A major advancement in the recent 
statistical literature on mediation has been the clarification of conceptual distinctions 
between controlled direct effects and natural direct effects (Pearl, 2001, Robins & 
Greenland, 1992). A controlled direct effect of attending a Head Start program on 
reducing the likelihood of repeating kindergarten retention is conceivable if all low- 
income children would display the same level of attention at kindergarten entry as a result 
of a second intervention regardless of their prior preschool experience. In contrast, a 
natural direct effect represents the change in retention status attributable to attending a 
Head Start program when the child’s attention skills at kindergarten entry would have 
counterfactually remained unchanged by the Head Start attendance. In many situations, 
scientific questions about natural direct and indirect effects are especially relevant for 
understanding causal mechanisms. 

In general, if the controlled direct effect of the treatment on the outcome does not depend 
on any mediator value, under identification assumptions explicated by previous 
researchers (Holland, 1988; Pearl, 2001; Robins & Greenland, 1992; Sobel, 2008), the 
natural direct effect and the controlled direct effect will become equal. Researchers can 
rely on path analysis, structural equation modeling, or similar regression models to obtain 
estimates of natural direct and indirect effects by invoking model-based assumptions. 
However, in the cases where the treatment and the mediator have an interaction effect on 
the outcome, the standard methods no longer apply. A number of alternative methods 
have been proposed recently for estimating the natural direct effect and the natural 
indirect effect. They typically involve specifying an outcome model as a function of the 
treatment, the mediator, and the observed covariates (Pearl, 2010; Petersen et al, 2006; 
VanderWeele, 2009). In addition, these methods require model-based assumptions about 
the association between the outcome and the mediator. Imai, Keele, and Yamamoto 
(2010) have developed a non-parametric procedure for estimating the natural indirect 
effect. Robins (2003) argued in the presence of post-treatment covariates, estimation of 
natural direct and indirect effects requires the assumption of no treatment-by-mediator 
interaction. Hence none of the existing methods are capable of handling post-treatment 
covariates that confound the mediator-outcome relationship when there is a treatment-by- 
mediator interaction. 

I propose an alternative non-parametric approach that allows for a simultaneous 
estimation of the natural direct and indirect effects regardless of whether there is a 
treatment-by-mediator interaction effect on the outcome. I use weighting to not only 
adjust for selection bias associated with observed confounders but also approximate the 
counterfactual mediator distribution associated with an alternative treatment condition. 
The weighted outcome is modeled only as a function of the direct and indirect effects of 
interest. Hence, it does not require any parametric specification of the functional 
relationships between the outcome and the treatment, between the outcome and the 
mediator, or between the outcome and the observed covariates. Because the outcome 
model is non-parametric, this analytic approach is applicable regardless of the 
distribution of the outcome, the distribution of the mediator, or the functional relationship 
between the two. I show that in the absence of post-treatment covariates, the weight can 
be derived easily under standard identification assumptions. In the presence of post- 
treatment covariates, a modification of the standard assumptions and an addition of 
another assumption enable us to derive the weight that adjusts for selection bias 
associated with both pretreatment and post-treatment covariates. 
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I organize this paper as follows. Section 2 introduces notation in the context of an 
illustrative application in which natural direct and indirect effects are of particular 
interest. Section 3 reviews the existing standard and alternative approaches to estimating 
the natural effects. Section 4 lays out the theoretical rationale for the new approach to 
estimating natural direct and indirect effects. This section also explains the analytic 
procedure in the context of the application example. Section 5 summarizes the major 
features of the new approach and raises issues for further inquiry. 

2. Definition of Natural Direct and Indirect Effects: An Illustrative 

Application 


2-1 Notation 

To illustrate in a simple setting, let A = 1 if a child has attended a Head Start program 
and let A = 0 represent the experience of growing up in a non-Head Start setting during 
the preschool years. Let us suppose that all schools use the same screening criteria to 
determine whether a new kindergartner is displaying attention skills at either a low level 
or a high level denoted by Z = 0 and Z = 1, respectively. The outcome is denoted by Y, 
with Y = 1 indicating that a child is retained in kindergarten and Y = 0 indicating that 
the child is not retained. Adopting Rubin’s Causal Model (Holland, 1986, 1988; Rubin, 
1978) and invoking the stable unit treatment value assumption (SUTVA) for simplicity 
(Rubin, 1986), 1 use Z a to denote a child’s potential mediator and *aZ a for the child’s 
potential final outcome. We would observe a child’s attention skills at the level Z x if the 
child has attended a Head Start program and Z 0 otherwise. If the child has obtained a 
high attention level after attending a Head Start program, the child would show retention 
status * 11 ; if the child has obtained a low attention level after attending a Head Start 
program, the child would show retention status i'lo instead. If the child has grown up in a 
non-Head Start setting and nonetheless has obtained a high attention level, the child 
would show retention status *oi ; otherwise, if the child has grown up in a non-Head Start 
setting and has obtained a low attention level, the child would show retention status ^oo- 
Additional counterfactual outcomes are denoted by w For example, *iz 0 represents 
the child’s retention status associated with attending a Head Start program yet the child’s 
attention skills were set at the level that the child would have displayed without attending 
Head Start. 

2-2 Controlled Direct Effects 

Following Pearl’s (2001) framework, the controlled direct effect is defined by *i z - 
T 0z ,ze (0,1) for a given child. However, it is possible that in a subset of the population, 
the controlled level z is different from these children’s potential mediator values Z x and 
Z 0 . Hence the question about the controlled direct effects has low practical relevance in 
this case. 

2-3 Natural Direct Effects and Natural Indirect Effects 

The natural direct effect is defined by *1Z 0 *oz 0 for a given child. Here the question 
becomes whether a Head Start program can reduce the likelihood of grade retention 
without raising low-income children’s attention skills. If assigned to a non-Head Start 
setting, children’s attention level Z 0 at kindergarten entry varies naturally in the 
population. Hence, to take the expectation of the natural direct effect requires a 
summation of the controlled direct effects over the distribution of Z 0 , that is, £'(T 1Zq — 
*oz 0 ) = Hz E(Y Xz ~ Y 0z )pr(Z 0 = z ) (Pearl, 2001). The natural indirect effect, defined by 
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Y \Z X ^IZo for a given child, is the change in retention status caused only by the Head 
Start-induced change in the child’s attention skills from Z 0 to Z 1 at kindergarten entry. 
The total effect of attending a Head Start program vs. a non-Head Start setting on 
retention status is the sum of the natural indirect effect and the natural direct effect: 
Y 1Z ± — Y oz 0 = (Xiz 1 ~ Y iz 0 ) + { Y iz 0 — ^OZo)- I will focus my discussion on how to 
analyze the natural direct and indirect effects as defined above. 

2- 4 Pretreatment and Post-Treatment Covariates 

Let X denote a set of pretreatment covariates that may confound the treatment-mediator 
relationship, the treatment-outcome relationship, or the mediator-outcome relationship. 
For example, minority status is a pretreatment covariate that may influence a child’s 
participation in a Head Start program, the child’s attention level at kindergarten entry, as 
well as the child’s likelihood of repeating kindergarten. Let L a denote a set of post- 
treatment covariates that may result from having been assigned to treatment a and may 
subsequently confound the mediator-outcome relationship. For example, Head Start may 
improve children’s physical well-being which may in turn raise their attention level and 
may also prevent school absence due to health problems in kindergarten and thereby 
reduce the likelihood of grade retention. 

3. Existing Approaches to Estimating Natural Direct and Indirect Effects 

3- 1 Path Analysis: the Standard Approach 

In this section I review the assumptions under which currently available methods 
generate consistent estimates of the natural direct and indirect effects defined above. 
Social scientists have employed path analysis as primary tools for identifying mediators 
of treatment effects on final outcomes (Baron & Kenny, 1986; Duncan, 1966). The 
analysis typically involves two linear regression models for a continuous outcome Y and 
a continuous mediator Z each specified as a linear function of treatment assignment A : 

Z = cc z T cLA + £ z , Y = (Xy T bZ + cA + £y , 

where e z and e Y are structural errors. Researchers usually interpret c as the direct effect 
of A on V and interpret the product bd as the indirect effect of A on Y mediated by Z . In 
this framework, the total effect of A on Y, which can be obtained by regressing Y on A, is 
equal to bd + c. 

Using the potential outcomes framework, researchers (Holland 1988; Imai et al, 2010; 
Pearl, 2001; Robins and Greenland, 1992; Sobel, 2008; VanderWeele & Vansteelandt, 
2009) have clarified the identification assumptions under which the above path 
coefficients have causal meanings. Suppose that pretreatment covariates X can possibly 
be controlled for through either research design or statistical adjustment. The assumptions 
include, for all values of a, a', z, and z': 

Assumption 1 (Nonzero probability of treatment assignment). 0 < pr(A = a \ X) < 1. 

Assumption 2 (Nonzero probability of mediator value assignment within a treatment). 
0 <pr(Z a = z\A,X) < 1. 

Assumption 3 (No confounding of treatment-outcome relationship). Y az J_J A | A. 

Assumption 4 (No confounding of mediator-outcome relationship within the actual 
treatment condition). Y az ]j Z a \ A = a, X . 
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Assumption 5 (No treatment-by-mediator interaction). E(Y az — Y a ' Z \X) = E(Y az ' — 
Ya'zV 0- * 

When the functional form of the linear structural model is correctly specified, the above 
assumptions suffice for evaluating the controlled direct effect represented by path 
coefficient c. 

Even when all the above assumptions hold, to estimate natural direct and indirect effects 
with the standard methods requires additional identification assumptions (Pearl, 2001): 

Assumption 6 (No confounding of treatment-mediator relationship). Z a ]j A | X. 

Assumption 7 (No confounding of mediator-outcome relationship across treatment 
conditions). Y az \\Z a : \ A = a, X. 

Under Assumptions 1—7, c represents the natural direct effect, and bd represents the 
natural indirect effect. 

The standard methods have some major limitations. When there is a treatment-by- 
mediator interaction, the controlled direct effect will depend on the fixed level of the 
mediator. Path coefficient c no longer represents the natural direct effect except when d 
is zero, that is, when Z does not qualify as a mediator between A and Y . Path analysis can 
make covariance adjustment for a rather limited number of pretreatment covariates and is 
prone to misspecifications of the functional form of the outcome model and the mediator 
model. In addition, when post-treatment covariates confound the mediator-outcome 
relationship, the standard methods through regression cannot adjust for such covariates 
without biasing the estimate of the causal effect of A on Z and that of the direct effect of 
A on Y (Rosenbaum, 1984). 

3-2 Alternative Methods for Estimating Natural Direct and Indirect Effects 

The assumption that treatment and mediator do not interact in causing the outcome is 
often implausible. Statisticians have recently proposed alternative approaches that relax 
the no-interaction assumption in estimating natural direct and indirect effects. These 
include the modified regression approach (Petersen et al, 2006) and the conditional 
structural models approach (VanderWeele, 2009). These new approaches represent an 
important advance in the research methodology for mediation. 

Petersen, et al (2006) modified the regression approach to estimating natural direct 
effects by directly incorporating an interaction between treatment and mediator in the 
outcome model. This approach still requires Assumptions 1 to 4 and 6. However, they 
replaced assumptions 5 and 7 with “the direct effect assumption”, that is, the controlled 
direct effect at the fixed value z does not depend on Z 0 , E(Y az — Y 0z \ A,X) = 
E(Y az — Y 0z \ Z 0 = z,A,X). For the direct effect assumption to hold, it is usually 
necessary to identify pretreatment characteristics that predict both Z 0 and Y az — Y 0z . The 
above assumptions suffice only for estimating the natural direct effect. To estimate the 
natural indirect effect, one must subtract the estimated natural direct effect from the 
estimated total effect. The estimation of the latter requires a different assumption that is 
generally stronger than Assumption 3. 
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Assumption 8. (Independence of treatment assignment and potential outcomes). 
YaZ a .Ya'z a ,.Y a z a ,.Ya'z a UA \X. 

This assumption usually holds when treatment assignment is randomized or approximates 
a randomized experiment within levels of the pretreatment covariates. 

Petersen et al’s analytic procedure involves three major steps. Step 1 is to analyze a 
multiple regression of the outcome on the mediator, treatment, pretreatment covariates, 
and their interactions E(Y \ A,X,Z ) and obtain the corresponding regression coefficient 
estimates, from which one can compute the controlled direct effect when Z = z as a 
function of X and test the null hypothesis of no direct effect. Step 2 is to analyze a 
multiple regression of the mediator on the treatment and pretreatment covariates 
E(Z | A, X), from which one obtains an estimate of E (Z Q |A) when A = 0 and then 
compute the sample estimate of E(X ), E(Z 0 ), and E (Z 0 X ). Step 3 is to compute the 
average direct effect of the treatment on the outcome E(Y 1Z( , i -y OZo ) = Pi + P2E(x) + 
/? 3 fi(.Z 0 ) + p A E(Z 0 X ). Similar to the standard approach of path analysis, the modified 
regression approach requires linearity of the model for Y and of the model for Z. The 
regression models become cumbersome when there are a relatively large number of 
pretreatment covariates. 

To estimate both natural direct effects and natural indirect effects, VanderWeele (2009) 
employed two conditional structural models similar to the regression models specified in 
Petersen et al’s (2006) steps 1 and 2. The conditional structural models approach invokes 
Assumptions 1, 2, 4, 6, 7, and 8. To relax Assumption 5, this approach again includes an 
interaction between the treatment and mediator in the outcome model. Yet several major 
differences distinguish these two approaches. Instead of using the direct effect 
assumption, the conditional structural models are made conditional on a subset of 
pretreatment covariates that are required for meeting Assumption 7. Let O denote the 
entire set of pretreatment covariates and let A( vii ) e Q represent the subset of 
pretreatment covariates required for meeting Assumption 7. The conditional structural 
models are E(Y az |A( vii ) = X( v j,)) = g(a, z, X( v jj)) for the effects of the treatment and 
mediator on the outcome and E(Z a |Y( vi j) = X( vii )) = h{a, X( vii )) for the effect of the 
treatment on the mediator. The confounding effects of all pretreatment covariates X in Q 
are adjusted through inverse-probability of-treatment weighting. When the above 
mentioned identification assumptions as well as the model-based assumptions hold, a 
weighted analysis of the two conditional structural models generates consistent estimates 
of the coefficients. Further assuming that g{a,z, X( vi n) is a linear function of z, one can 

obtain an estimate of the counterfactual outcome E ( Y aZ a 'l-^(vii) — -*-(vii)) — 
^(a, h(a , X( vii )), X( vii )). The result is consistent with that derived by Petersen et al 
(2006). 

A potential advantage of the conditional structural models approach, when compared 
with the modified regression approach, is that the inverse-probability-of-treatment 
weighting strategy enables adjustment for a relatively large number of pretreatment 
covariates. However, as the subset of covariates related to Assumption 7 increases, the 
above advantage vanishes and the conditional structural models become cumbersome. 
The conditional structural models approach and the modified regression approach share a 
number of other limitations. Both strategies require combining parameters estimated from 
the outcome model and the mediator model to compute the point estimates of natural 
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direct and indirect effects without simultaneous estimation of the confidence intervals; 
both approaches are prone to specification errors in the functional form of the outcome 
model and the mediator model; both procedures apply only to continuous outcomes that 
have a linear relationship with the mediator; and finally, in the presence of treatment- 
mediator interactions, neither procedure is suitable if a post-treatment covariate 
confounds the mediator-outcome relationship. 

Viewing the counterfactual outcomes as missing data, van der I.ann and Petersen (2005, 
2008) outlined a series of methods for estimating the natural direct effect. These included 
inverse-probability-of-censoring-weighted estimation method, double robust inverse- 
probability-of-censoring-weighted estimation method, and likelihood regression-based 
estimation method. All these methods require a user-supplied parameterization of the 
natural direct effect and additional modeling assumptions to obtain estimators with good 
practical performance. 

Imai and his colleagues (2010) developed a non-parametric estimation procedure under 
the assumption of sequential ignorability. In essence, the treatment is assumed to be 
ignorable given the pretreatment covariates; and the mediator is assumed to be ignorable 
given the corresponding treatment and the pretreatment covariates. These are equivalent 
to assumptions 1, 2, 4, 6, 7, and 8. The estimation involves stratifying a sample on 
pretreatment covariates. Within each stratum, the average causal mediated effect (i.e., the 
natural indirect effect) can be estimated by obtaining sample estimates of the conditional 
outcome associated with a given treatment and a given mediator value, the difference in 
the density of a given mediator value between the treatment group and the control group, 
and the product of these two quantities summed over all the mediator values. Finally, the 
causal effect estimate is obtained by aggregating the stratum-specific estimates. 

4. Ratio of Mediator Probability Weighting 
4-1 Statistical Adjustment for Pretreatment Covariates 

1 propose an alternative non-parametric weighting approach for estimating both natural 
direct and indirect effects. This new method invokes the same assumptions as those 
required by the conditional structural model approach described in the previous section. 
These assumptions suffice for estimating natural direct and indirect effects in the absence 
of post-treatment covariates. 

In general, the joint distribution of the observed data O = (X,A,Z a ,Y AZ a ) can be 
represented as 

f Ca,z) (Y az \A = a, Z a = z,X ) X q( a \Z a = z\A = a, X) X p(A = a\X) X h(X), 
where /(“' z )( ■), q^(-), p(-), and h( •) are density functions. For simplicity, I use /(■) to 
represent f( a ' z > (■) in the discussion below. 

When A is binary, to estimate the natural direct effect E(Y 1Zc i Y oz 0 ) and the natural 
indirect effect E (Y 1Z YiZo)> we need an estimate of each of the three marginal mean 
outcomes E(Y OZo ), E(Y 1Zi ), and E(Y 1Zo ). An unbiased estimate of E(Y OZo ) and E(Y 1Zi ) 
can each be obtained from the observed data under Assumptions 1 and 8. In its general 
form, 

E(Y aZ J = E{E(Y aZa \X )} = E{E{Y aZa \A = a,X)} = E(Y*\A = a), (1) 
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where Y* = W (aZa) Y and YJ {aZg) = p(A = a) /p(A = a\X) for all possible values of a. 
This is equivalent to the inverse-probability-of-treatment weight (IPTW) used in marginal 
structural models (Robins, 1999; Rosenbaum, 1987). I will discuss in section 4-3 
alternative strategies for computing the weight to increase the robustness of the 
estimation results. 

I develop a different type of weighting — namely, ratio of mediator probability 
weighting — to estimate E(Y 1Zq ). Under Assumptions 4 and 7, that is, no confounding of 
mediator-outcome relationship either within a treatment condition or across treatment 
conditions, we have that 

f(Yaz = y\A = a,Z a : = z,X = x) = f(Y az = y\A = a,Z a = z,X = x ). 

In other words, within levels of the pretreatment covariates, the counterfactual outcome 
Y iz 0 of experimental units when their counterfactual mediator Z 0 would display value z is 
assumed to be the same in distribution as their observed outcome Y iz 1 when the observed 
mediator Z x actually displays value z. Moreover, under Assumption 6, the experimental 
group and the control group are exchangeable in distribution of Z 0 within levels of the 
pretreatment covariates. Hence, the basic rationale is to take the integral of the observed 
outcome of the experimental units over the conditional distribution of Z () , q <iy> (Z () = 
z\A = 0, A). This is equivalent to assigning a weight to each experimental unit such that 
the weighted distribution of Z x in the experimental group approximates the distribution of 
Z 0 in the control group within levels of the pretreatment covariates. The weight to be 
applied for this purpose is 

q (0 \Z 0 = z\A = 0,X)/qW(Z 1 = z\A = 1,X). 

In addition, we apply weight p(A = l)/p(A = 1 1 A) to make the experimental group and 
control group exchangeable in pretreatment composition. 

THEOREM 1. Under Assumptions 1, 2, 4, 6, 7, and 8, E(Y*\A = 1) = E(W^ 1Zq ^Y\A = 
1) is an observed data estimand for E (Y 1Zq ), where W( 1Z ) is equal to 

{q (0) (.Zo = z\A = 0,A)/qW(Z 1 = z\A = 1, A)} X {p(A = l)/p(A = 1|A)}. (2) 

In general, to estimate the expectation of the counterfactual outcome E(Y aZ ,) in the 
absence of post-treatment covariates, the weight is 

W ( az a ,) = { q (a HZ a , = z\A = a',X)/q^{Z a = z\A = a, A)] 
x {p(i4 = a)/p(A = a|A)}. 

See Appendix 1 for the proof. 

4-2 Statistical Adjustment for Both Pretreatment and Post-Treatment 
Covariates 

Let L a be a vector of post-treatment covariates that confound the mediator-outcome 
relationship when treatment a is given. In general, the joint distribution of the observed 
data 0 = (A, A, L A , Z A , Y AZa ) can be represented as 

f(Y az \A = a, Z a = z,X,L a ) X q^\Z a = z\A = a,X,L a ) X gW(L a = l\A = a, A) 
x p{A = a\X) x h{. A). 

To adjust for the confounding effects of both pretreatment and post-treatment covariates, 
I modify the Assumptions 2, 4, and 7 for all values of a, a', and z: 

Assumption 2*. (Nonzero probability of mediator value assignment within a treatment). 
0 <pr(Z a = z\A,X,L a ) < 1. 
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Assumption 4*. (No confounding of mediator-outcome relationship within a treatment 
condition). Y az \\Z a \ A = a, X, L a . 

Assumption 7*. (No confounding of mediator-outcome relationship across treatment 
conditions). Y az \\Z a ' \ A = a, X, L a . 

In addition, 1 introduce the following identification assumption: 

Assumption 9. (Independence of counterfactual mediator and post-treatment covariates 
across treatment conditions). Z a > Jj L a \ A = a, X^ ix y 

Here A( ix ) represents a set of pretreatment covariates that may or may not overlap with 
the pretreatment covariates X required for other assumptions stated above. We use X + to 
denote the union of sets of pretreatment co variates X and A( ix ). 

Even though Z a > and L a are often related by common causes, Assumption 9 can be made 
plausible in some settings especially under an appropriate research design. In our 
application example, a child’s pretreatment health conditions are presumably among the 
most important pretreatment covariates that may confound the relationship between the 
child’s counterfactual attention level Z 0 at kindergarten entry as a result of growing up in 
a non-Head Start setting and the child’s health condition L 1 as a result of attending Head 
Start. Moreover, if low-income children from the same neighborhood are assigned at 
random to Head Start programs versus the control condition, a multi-site randomized 
design effectively adjusts for neighborhood level confounding factors for the relationship 
between Z 0 and L 1 . After statistical adjustment for children’s pretreatment health 
conditions among other pretreatment covariates in a multi-site randomized experiment, 
the assumption that a child’s counterfactual attention level Z 0 does not depend on the 
child’s health condition L 1 may hold approximately. In the presence of post-treatment 
covariates that confound the mediator-outcome relationships, Assumption 9 appears to be 
weaker than assuming that such post-treatment covariates do not exist. 

When A is binary, again we obtain an unbiased estimate of E (Yq Zo ) and E (Y 1Z ) from the 
observed data under Assumptions 1 and 8 in a form similar to that shown in Equation (1). 

THEOREM 2. Under Assumptions 1, 2*, 4*, 6, 7*, 8, and 9, E(Y*\A = 1) = 
E(W (1Zo) Y\A = l) is an observed data estimand for E(Y 1Z ) in the presence of post- 
treatment covariates, where W( 1Z ^ is equal to 

{q (0 \Z 0 = z\A = 0 ,A + )/qW(Z 1 = z\A = 1 ,X + ,L 1 )} X {p{A = !)/p(A = 1\X + )}. 

( 3 ) 

In general, to estimate the expectation of the counterfactual outcome E(Y aZ ,) in the 
presence of post-treatment covariates, the weight is 

W(oz a .) = { q {a \Z a , = z\A = a’,X + )/q^{Z a = z\A = a,X + ,L a )} 
x {p(i4 = a)/p(A = a|A + )}. 

See Appendix 2 for the proof. 

The assumptions listed in Theorem 2 are required for estimating the expected value of the 
counterfactual outcome E(Y 1Zq ). Specifically, under Assumptions 4* and 7*, we have 
that 

f(Y az = y\A = 1 ,Z 0 = z,X + =x,L 1 = Q = f(Y az = y\A = 1,Z 1 = z,X + = x,L 1 = l ). 
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Under Assumption 9, we have that 

q m (Z 0 = z\A = 1,X + = x,Li = 0 = q (0 \Z 0 = z\A = 1,X + = x). 

The latter is then equal to q(°\Z 0 = z\A = 0,X + = x) under Assumption 6. The ratio of 
mediator probability weight to be applied in this case is 

q (0 KZ 0 = z\A = 0 ,A + )/qW(Z 1 = z\A = 1,X + ,L 1 ). 

Under the assumptions stated above, the weighted distribution of Z 1 under the 
experimental condition approximates the distribution of Z 0 under the control condition. 

4-3 Analytic Procedure 

If A, Z A , X, and L A are all categorical variables, the conditional probabilities in the 
numerator and those in the denominator of the weight can be empirically determined by 
the observed proportions of units in the corresponding cells. This process will reveal any 
violations of Assumptions 1 and 2 or 2* in the observed data; and if so, the analytic 
sample will be redefined accordingly by excluding units that show zero probability of 
assignment to a certain treatment or to a certain mediator value under a given treatment. 

In general, for a binary treatment, we can analyze a logistic regression to obtain an 
estimate of the propensity of being assigned to a particular treatment condition d A (x) = 
pr(A = a\X = x) (Rosenbaum & Rubin, 1983). Similarly, for a binary mediator, logistic 
regression can be employed to estimate the propensity of having the mediator value under 
the given treatment condition, 0 Z(j (x) = pr(Z 0 = z\A = 0, X = x), 0 Z (x) = 
pr(Z 1 = z\A = 1, X = x) or Q Zi (x, l) = pr(Z 1 = z\A = 1,X + = x, L t = /). The 
propensity score of a mediator measured on an ordinal, categorical, or continuous scale 
can be estimated by employing alternative strategies proposed in the literature (Huang et 
al, 2005; Joffe & Rosenbaum, 1999; Imai & van Dyke, 2004; Imbens, 2000; Lu, Zanutto, 
Homik, & Rosenbaum, 2001; Zanutto, Lu, & Homik, 2005). 

Here 1 propose a non-parametric approach to estimating the weight by stratifying the 
sample on each propensity score. Suppose that we divide the sample into five strata on 
the basis of 0 Zq (x ) denoted by S 0 = 1, •••,5. We then divide the same sample into 
another five strata on the basis of 0 Zi (x) denoted by 5 X = 1, •••,5. In order to estimate 
E(Yiz 0 ) when A is randomized, the ratio-of-mediator-probability weight for a sampled 

unit i is simply j " Sl — j- Here n So is the number of sampled units in the 

same propensity stratum with unit i when the sample is stratified on 6 Zf) (x); n s z =1 is 
the number of sampled units in the same propensity stratum that displayed the mediator 
value Z 0 = 1; n Si is the number of sampled units in the same propensity stratum with 
unit i when the sample is stratified on 0 Z[ (x); n Si Zi=1 is the number of sampled units in 
the same propensity stratum that displayed the mediator value Z 1 = 1. Recent results 
have shown that weighting on the basis of propensity score stratification produces causal 
effect estimates that are robust to misspecifications of the propensity score model 
especially in comparison with 1PTW estimates (Hong, in press). The non-parametric 
weighting approach usually provides a better approximation of nonlinear or non-additive 
relationships between treatment assignment and pretreatment covariates and has a built-in 
procedure of excluding units that do not have counterfactual information in the observed 
data. 

The estimation involves analyzing a weighted outcome model as a function of the natural 
direct effect and the natural indirect effect of interest with minimal model-based 
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assumptions. Each natural effect is represented by a single parameter in the outcome 
model. When the treatment is binary, 1 reconstruct the data set to include the sampled 
control units, the sampled experimental units, and a duplicate set of the experimental 
units. Let D be a dummy indicator that takes value 1 for the duplicate experimental units 
and 0 otherwise. We assign the weight as follows: W = W( OZo ) if A = 0 and D = 0; 
W = W (1Zo) if A = 1 and D = 0; and W = W( 1Zi ) if A = 1 and D = 1. Elere W^ 0 Zq ) = 
p(A = 0)/pG4 = 0|A); W( 1Zi) = p(A = l)/p(A = 1|A); W( 1Zg ^ is defined in Equation 
(2) if there are no post-treatment covariates and is defined in Equation (3) with post- 
treatment covariates. To simultaneously estimate the natural direct and indirect effects on 
a continuous outcome, we analyze a weighted regression 

Y = Yo + Ay + ADy + e. 

In the above model, y 0 represents E(Y 0Zq ); represents the natural direct effect 

E(Y 1Zo — Y OZo ); and y^ 7 represents the natural indirect effect E(Y 1Zi — Y IZq ). For 
example, to estimate the natural direct and indirect effects of attending a Head Start 
program on a child’s likelihood of repeating kindergarten, we may analyze a weighted 
generalized linear model with a logit link function. We use estimated robust standard 

errors to compute respective confidence intervals for y| WD) and for y z N ^ . Alternatively, 
researchers may use bootstrap to obtain an estimate of the standard error. To improve the 
precision of estimation, the outcome model may include pretreatment covariates that are 
strong predictors of the outcome. 


5. Conclusion 

Pearl’s (2001) formulation of natural direct and indirect effects promises to infuse 
inspiration for a new class of scientific questions. Allowing the mediator value under 
each treatment condition to vary naturally among units in the population, the natural 
direct effect provides a useful summary of the treatment effect on the outcome when the 
mediator value remains unchanged by the treatment, while the natural indirect effect 
summarizes the treatment effect on the outcome attributable to the treatment-induced 
change in the mediator value. However, applications of this new formulation have been 
rare due to the practical challenges in implementing the existing methods. 

How individuals respond to the treatment assignment at the intermediate stage typically 
reflects their pretreatment characteristics that may also predict their outcome at a fixed 
level of the mediator value. Hence treatment-mediator interactions are highly plausible. 
In the meantime, post-treatment covariates may confound the mediator-outcome 
relationships in some experimental and non-experimental studies of mediation. This is 
because the mediator as an intermediate outcome of the treatment could have been 
influenced by many other processes happening in between. Ignoring either treatment- 
mediator interactions or post-treatment covariates could lead to misleading results in 
mediation studies. 

1 have proposed a weighting approach that accommodates treatment-mediator 
interactions and adjusts for both pretreatment and post-treatment covariates in estimating 
the marginal mean of each counterfactual outcome. In particular, in order to estimate 
E(Y 1Zo ), the weight is computed as a ratio of the conditional probability of a mediator 
value under the control condition to the conditional probability of the same mediator 
value under the experimental condition. The weighted distribution of the mediator under 
the experimental condition approximates their counterfactual mediator distribution under 
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the control condition. After weighting, the natural direct effect and the natural indirect 
effect of interest are each represented by a single parameter in a non-parametric outcome 
model and are estimated simultaneously. 

Through weighted estimation of counterfactual means, this new approach displays a 
number of potentially important advantages over the existing methods for estimating 
natural direct and indirect effects. First of all, because this new approach estimates the 
natural direct effect without taking an average over the controlled direct effects, it does 
not require the assumption of no interaction effect of treatment and mediator on the 
outcome. Secondly, this new method does not require combining multiple parametric 
models as has been the case in all other existing methods. Thirdly, the non-parametric 
outcome model avoids model specification errors and applies regardless of the 
distribution of the outcome or the functional relationship between the outcome and the 
mediator. Hence unlike most other existing methods, it does not require an explicit linear 
or nonlinear relationship between the outcome and the mediator. Fourthly, the weighting 
approach enables adjustment for a large number of covariates without reducing the 
degrees of freedom in analyzing the outcome model. Fifthly, the non-parametric 
approach to estimating the weight enhances the robustness of the causal effect estimates 
even if the parametric model for each mediator is misspecified. Finally, this new 
weighting approach enables researchers to adjust for post-treatment covariates that 
confound the mediator-outcome relationships. Other existing methods require the 
assumption of no such post-treatment confounders, which may limit their applications. 
Due to these important features, the non-parametric approach through weighting enables 
researchers to investigate a significantly broader range of scenarios than most of the 
existing methods can handle. In the absence of post-treatment covariates, the 
performance of the ratio-of-mediator-weighting method relative to other non-parametric 
and parametric methods is yet to be assessed through simulations. Future research may 
extend this approach to studies of multi-valued treatments, multiple mediators, time- 
varying treatments, time-varying moderators, and time-varying mediators. 
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Appendix 1 

Proof of Theorem 1 

Theorem 1 requires that we derive a weight W( 1Z ) such that E(Y 1Zq ) can be consistently 
estimated by E(W^ 1Zq ^Y \ A = l). 

= EfEpw*)}. 

By Assumption 8, the above is equal to 
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E{E{Y 1Zo \A = 1,X)} 


= ffjyx f(V lz = y\A = 1,Z 0 = Z,X = x) 

x,z,y 

x q(°\Z 0 = z\A = 1,X = x) x h(X = x)dydzdx, 
which, by Assumptions 4, 6, and 7, is equal to 

Iff y X f(V lz = y\A = 1 ,Z 1 =z,X = x)x q«»(Z 0 = z\A = 0,X = x) 

x,z,y 

x h(X = x)dydzdx 

which, by Bayes Theorem and Assumptions 1 and 2, is equal to 


Iff y x f(Y lz = y\A = 1,Z 1 

x,z,y 


x h(X = x\A = 1) x 


= z,X = x) x q(E (z x = z\A = 1 , X = x) 

q(°\Z 0 =z\A = 0,X = x) 
qWCZi = z|i4 = 1,X = x) 


where Y* = tV flZo )T and 


^(iz 0 ) = {q ( °\Z o = z|2l = 0,A)/q( 1 )(Z 1 = z|2l = 1,*)} X {p(4 = l)/p(A = 1|Z)}. 
This concludes the proof. C 


Appendix 2 
Proof of Theorem 2 

Theorem 2 requires that we derive a weight VP( 1Z ) such that £'(T 1Z(| ) can be consistently 
estimated by E(W^ 1Z ^Y \ A = l). 

£(r 1Zo ) = £{£(r lz jx + )). 

By Assumption 8, the above is equal to 

E{E{Y 1Zo \A = 1,X + )} 

= If If yx f(Y lz = y\A = 1,Z 0 = z,X + = x,L 1 = l) 

x,l, z,y 

X q(°)(Z 0 = z|i4 = 1,X + =x,L 1 = l)x p (1) (£i = l\A = 1,X + = x) 
x h{X + = x)dydzdldx, 
which, by Assumptions 4* and 7*, is equal to 

ff If 7 * f(Vlz = y|i4 = 1,Zl = Z ’ X+ =X ’ Ll = 0 

x,l, z,y 

X q(°)(Z 0 = z|i4 = 1,X + =x,L 1 = l)x gW(L i = l\A = 1,X + = x) 
x h{X + = x)dydzdldx, 
which, by Assumptions 9 and 6, is equal to 

If ff yxf(Y lz = y\A = 1,Z 1 = z,X + = x, L t = l) X q W(Z 0 = z\A = 0,A + = x) 

x,l, z,y 

x g < ' 1 \L 1 = l\A = 1,X + = x) x h(X + = x)dydzdldx, 
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which, by Bayes Theorem and Assumptions 1 and 2*, is equal to 

If If yX f(Vlz = y|i4 = 1,Zl = Z ’ X+ =X ’ Ll = 0 

x,l, z,y 

X q( 1 \Z 1 = z\A = 1,X + = x, L 1 = 0 X g m (L t = l\A = 1,X + = x) 

q(°)(Z 0 = z\A = 0,X + = x) 
x h(X + = x\A = 1) x H K 0 1 


x 


p(A = l) 


= z\A = 1.X+ =x,L 1 = l ) 

- dydzdldx 


p(A = 1\X+ =x) 

= £(n^ = i), 

where Y* = W (1Zn) Y and 

^(iz 0 ) = [q W (Z o = Z \A = 0,X + yqM ( Zl = Z \A = 1,X + ,L 1 )} 
X {p(A = l)/p(A = 1|A + )}. 

This concludes the proof. C 
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